{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "3c95a48bc526d8f3",
   "metadata": {},
   "source": [
    "# 第40章 近端策略优化PPO"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6438df9d",
   "metadata": {},
   "source": [
    "## 习题40.1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5b03583b",
   "metadata": {},
   "source": [
    "&emsp;&emsp;策略梯度算法REINFORCE、带基线的REINFORCE、演员-评论员、TRPO、PPO是一步步发展而来的，总结每一个算法对之前算法的主要改进点。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a24401c0",
   "metadata": {},
   "source": [
    "**解答：**  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a26f336d",
   "metadata": {},
   "source": [
    "**解答思路：**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3abf0c86",
   "metadata": {},
   "source": [
    "**解答步骤：**  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bcbdb752",
   "metadata": {},
   "source": [
    "## 习题40.2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d850e4bc",
   "metadata": {},
   "source": [
    "&emsp;&emsp;费舍尔信息矩阵的一般定义是\n",
    "$$\n",
    "F(\\theta) = \\mathcal{E}_{p_{\\theta}(x)} \\left[ \\nabla_{\\theta} \\log p_{\\theta}(x) (\\nabla_{\\theta} \\log p_{\\theta} (x))^T \\right]\n",
    "$$\n",
    "证明与引理40.2中的定义等价。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "79a2ab8e",
   "metadata": {},
   "source": [
    "**解答：**  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "98215627",
   "metadata": {},
   "source": [
    "**解答思路：**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed67f6f0",
   "metadata": {},
   "source": [
    "**解答步骤：**  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fa80866c",
   "metadata": {},
   "source": [
    "## 习题40.3"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e1360bc2",
   "metadata": {},
   "source": [
    "&emsp;&emsp;写出PPO-Clip算法实现中的计算公式(40.16)\\~(40.17)的函数表，验证它与截断目标函数的等价性。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee9b66ea",
   "metadata": {},
   "source": [
    "**解答：**  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d891b876",
   "metadata": {},
   "source": [
    "**解答思路：**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f37a4fa0",
   "metadata": {},
   "source": [
    "**解答步骤：**  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "73518130",
   "metadata": {},
   "source": [
    "## 习题40.4"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67dedac3",
   "metadata": {},
   "source": [
    "&emsp;&emsp;列出PPO和深度Q网络的不同点。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fe285d39",
   "metadata": {},
   "source": [
    "**解答：**  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc51edff",
   "metadata": {},
   "source": [
    "**解答思路：**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11cb028d",
   "metadata": {},
   "source": [
    "**解答步骤：**  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "340ace35040ef97",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
